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ABSTRACT 


This paper presents a natural language processing based automated system 
called DrawPlus for generating UML diagrams, user scenarios and test cases 
after analyzing the given business requirement specification which is written in 
natural language. The DrawPlus is presented for analyzing the natural languages 
and extracting the relative and required information from the given business 
requirement Specification by the user. Basically user writes the requirements 
specifications in simple English and the designed system has conspicuous ability 
to analyze the given requirement specification by using some of the core natural 
language processing techniques with our own well defined algorithms. After 
compound analysis and extraction of associated information, the DrawPlus 
system draws use case diagram, User scenarios and system level high level test 
case description. The DrawPlus provides the more convenient and reliable way 
of generating use case, user scenarios and test cases in a way reducing the time 
and cost of software development process while accelerating the 70% of works 
in Software design and Testing phase 
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I. INTRODUCTION 

Software design and software testing is known as some of 
the key critical areas in the software development life cycle 
which takes considerable amount of time and cost to a 
software project. Basically in the software design phase 
system analysis are design the software by using various 
diagrams (UML) or charts bases on requirement 
specification given by the business. In the current process of 
designing a software almost every design is done by 
manually by utilizing large amount of human effort and can 
have some reliability issues with the design. 

When it's come to the testing phase it is the last phase of 
SDLC before software is delivered and in this phase software 
is test against the requirements. Here generating test cases is 
one of the main task and testers face many problems when 
generating test cases such as ambiguity in the requirements, 
huge time takes to read and understand the requirement 
specification etc. By considering the all of those problems we 
came up with an integrated software design tool which is 
capable of generating use case, user scenarios and Test cases 
for the given requirement 

Specification in a way accelerate the Design and Test phase 
of the SDLC. So that this paper proposes an approach to 
generate UML diagrams, user scenarios test case from 
software requirements expressed in natural language using 
natural language processing techniques. Basically, system 
takes and preprocess the requirements specification as the 
input, then use NLP core techniques such as sentence 
detector, tokenizer, pos (part-of-speech) tagger and de- 


tokenizer with our own algorithms to capture and filter only 
necessary parts of the given requirement specification. 
Finally system produce outputs in three forms such and UML 
diagrams, user scenarios and test cases. 

II. METHODOLOGY 

For the domain of this research has carried out there are 
always limited ways of stating a requirement. Hence it's a 
matter of catching each possible ways of stating a 
requirement. But increase in the Grammar array may reduce 
performance. For each grammar we have defined, there is a 
corresponding rule to be applied in order to perform the 
actor and function extraction. There for there comes the Rule 
set. As a result of this research project, the research team 
was able to introduce few new algorithms to be used with 
Open NLP libraries. The algorithm is applicable when the 
concept of grammar is involved. Grammar is composed of 
multiple number of tags with the expected sequence of the 
tags. 

The algorithm processes one sentence at a time. Initially the 
grammar matching algorithm receives a sentence with all the 
words are tagged [5] by the POS Tagger [1]. The entire 
function is running on top of a nested loop consist of two 
nested loops. The algorithm then has three loops inside, each 
is nested to the other. Altogether it makes 5 loops nested to 
each other and containing more than 13 conditions in 
between. The Grammars and Rules are defined and store in a 
two dementional array. For each grammar there we have 
defined, thers is a coresponding Rule. The rule contains 
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information and instructions to follow when a particular 
grammar was hit. The correct rule for a particular grammar 
can always be refered by the index number of the grammar. 
The Figure 1 shows how the above mentioned loops are 
nested and the connection and the purpose of each loop. 
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Figure 2 - Comparison efficiency 



Figure 1 - Loop structure 

Considering the 3 loops inside the algorithm, the first loop 
concerns about the grammar number which is now being 
compared, and the second loop concerns about the tokenized 
tagged sentence. The tokenized sentence is prepared by 
tokenizing the tagged sentence using the Open NLP 
Tokenizer [3]. The third and the last loop concerns about 
multi valued tags if defined any in the selected grammar. 
Multi valued tags are cases where the grammar indicates, in 
a particular tag position whether there can be any 
alternative tags. This technique is lately added to improve 
the efficiency of the algorithm. With the multi-valued check 
technique, we were able to reduce more than half of the 
grammars and number of rules defined in the method. This 
technique also improves the overall efficiency of the entire 
process. 

Since the first loop concerns about the Grammar rules 
number, increment in a single grammar would cause a 
massive number of comparisons inside the next two loops. 
As an average value for a description with 12 sentences, the 
algorithm makes 710 comparisons for 22 grammars defined. 
Therefore reduction in the number of grammars effect all 
three inside loops directly. The multi-valued tags are 
identified by the"/” sign i n between two tags in a single tag 
position. The efficiency rate is shown below in the Figure 1 
and Figure 2. The tests were carried out by testing giving 
random sentences as the input for the program. Test 1 
contains 12 sentences, test 2 contains 10 sentences and test 
3 contains 6 sentences. The test result show that the 
technique has an effective and direct impact for the 
performance of the overall process. As the decrease in the 
grammar resulted in a decrease in number of comparisons, 
the total time taken to process the data were also affected 
and resulted a decreased. The test results shown in Figure 2 
and Figure 3 shows direct and effective outcomes of the 
application of the multivalued tag check into the grammar. 
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Figure 3 - Duration efficiency 

There can be cases where this techniques may not show any 
performance, cases where any sentences doesn't match with 
a grammar that does have a multi-valued tag. When finding a 
matching grammar, whenever small segments in the 
grammar is matched with the segments in the sentence, the 
indexes of the matched tokens are saved in a string array to 
be used when locating the actions and actors. The Grammars 
are stored in a special order. First comes the longest in 
length to avoid conflict where one grammar can be consisted 
inside another. 



Figure 4 - Grammar matching algorithm 


@ IJTSRD | Unique Paper ID - IJTSRD22900 | Volume - 3 | Issue - 3 | Mar-Apr 2019 


Page: 956 


























International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com elSSN: 2456-6470 


The Figure 4 shows the wireframe of the Grammar matching 
algorithm. The third block of the figure 4 refers to an array 
used for a special purpose in this algorithm. The array 
simply bridges the sentence and the Rule. Then indicates 
where the grammar findings lie inside the original sentence. 
The content inside the Rule (last block in figure 4) show 
where to look for. The numbers at the first index in each 
element refers to the special array in section three. Where 
ultimately it is pointed to the indexes of the original 
sentence. When a sentence completely matches a grammar, 
the matched grammar number in the grammar set is passed 
into two algorithms named getActorQ and getActionQ. The 
both algorithms take grammar number as the input and 
apply the rules defined specifically for the matched grammar 
number which will produce the ultimate result, the actor and 
the actions executed by the same actor. The algorithm can 
manage advanced relationships between the data. 

The algorithms is capable of identifying multiple actors with 
their actions mentioned in the same sentence and multiple 
actions mentioned to be executed by a particular actor. The 
algorithm is further developed to support co-reference in an 
advanced manner. When a grammar was hit the mentioned 
actor is stored in a variable carries forward till the next actor 
gets a hit. Meanwhile if an action is caught without a proper 
actor and if the sentence matches the right grammar 
requirements, the previous actor that was carried forward 
will be patched as the actor. The algorithm is also 
maintaining two variables to get statistics of hit counts and 
miss counts when matching a grammar. The two count 
variables will be resetted each time it is switching between 
grammars. This count can be used to prompt the users as an 
indication of the confidence level of each segment of the 
result. 

Since the output of these algorithms can complex as the data 
is related to one another, these data cannot be stored in 
regular variables available in JAVA [4]. Therefore we have 
created a data structure to maintain data, special methods to 
retrieve the data or insert new data. 

When it comes to the the UML design phase main input is an 
XML file which is generated form NLP core. For that a rich 
algorithm was implemented. Initial location of diagram is 
provided in the first step. The algorithm is executed in this 
flow. First actor of the XML is identified and get number of 
functions intended for that actor. According to that X, Y 
location 1st actor is drown in left side. Then get that actor X 
axis value, changing Y axis value and draw the 1st function 
from the function set. Define a constant value to put a space 
in between two functions. The remaining functions are 
drown related to the selected actor by changing X axis value. 
Y axis value needs to be changed for all the actor up until left 
side allocated space is over. 2nd actor Y alliance is taken by 
using this formula. 

Whole functions area= number of function count*(function 
high + difference) 

Next actor location= whole functions area + difference 

Another condition is to be checked before draw the actor. 
End point for last function for the next actor <= window size. 
If this condition is false then that should be draw in left side 
else it should be move to opposite side. 


To draw arrow in between actors is needed two parameters. 
Actor location and the function location are the both location 
and actor location is taken by dividing the height of the actor 
and function location is taken by getting value of X axis. 
When the actor move to opposite side these two parameters 
are changed according to that location. 

III. CONCLUSION 

In this paper, we have presented a structure of modeling use 
case diagram, use case scenario and high level test case 
description extracted from the given description. We have 
introduced an intelligent algorithm to match the grammar of 
many different form. The algorithm is smart enough to notify 
the user the certainty of the particular piece of findings. 
From this algorithm we can identify the relationship of 
actors and their functions. 

The results shows evidence that improving the algorithm is 
much more effective than increasing the number of grammar 
to be checked. Which elevated the total performances of the 
system. The accuracy can be increased by adding more and 
more grammar rules into the application. Therefore we 
figured out that each step we take trying to improve the 
accuracy the performance of the software reduces. But the 
fact is no user is willing to get the chance of having a possible 
faulty results over an extra few seconds that they have to 
spend more. 
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